Search CORE

Copenhagen University Research Information System

An efficient genetic algorithm for structural RNA pairwise alignment and its application to non-coding RNA discovery in yeast

Author: A Harmanci
A Harmanci
A Taneda
A Uzilov
A Uzilov
A Wilm
Akito Taneda
B Knudsen
C Lu
C Notredame
C Notredame
C Selig
CC Chang
CMA Davis Jr
D Dalli
D Dalli
D Rose
D Sankoff
DE Goldberg
E Rivas
E Rivas
E Torarinsson
E Torarinsson
E Torarinsson
F Miura
G Gonsalvez
H Kiryu
H Kiryu
I Hofacker
I Hofacker
I Holmes
J Cherry
J Gorodkin
J Havgaard
J Havgaard
J Pedersen
J Schultz
J Thompson
J Thompson
K Katoh
K Missal
K Missal
L David
M Bauer
M Gerstein
M Samanta
P Carninci
R Dowell
R Klein
R Nussinov
S Needleman
S Washietl
S Washietl
S Washietl
S Will
W Gish
X Xu
Y Tabei
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Aligning RNA sequences with low sequence identity has been a challenging problem since such a computation essentially needs an algorithm with high complexities for taking structural conservation into account. Although many sophisticated algorithms for the purpose have been proposed to date, further improvement in efficiency is necessary to accelerate its large-scale applications including non-coding RNA (ncRNA) discovery. Results We developed a new genetic algorithm, Cofolga2, for simultaneously computing pairwise RNA sequence alignment and consensus folding, and benchmarked it using BRAliBase 2.1. The benchmark results showed that our new algorithm is accurate and efficient in both time and memory usage. Then, combining with the originally trained SVM, we applied the new algorithm to novel ncRNA discovery where we compared <it>S. cerevisiae </it>genome with six related genomes in a pairwise manner. By focusing our search to the relatively short regions (50 bp to 2,000 bp) sandwiched by conserved sequences, we successfully predict 714 intergenic and 1,311 sense or antisense ncRNA candidates, which were found in the pairwise alignments with stable consensus secondary structure and low sequence identity (≤ 50%). By comparing with the previous predictions, we found that > 92% of the candidates is novel candidates. The estimated rate of false positives in the predicted candidates is 51%. Twenty-five percent of the intergenic candidates has supports for expression in cell, i.e. their genomic positions overlap those of the experimentally determined transcripts in literature. By manual inspection of the results, moreover, we obtained four multiple alignments with low sequence identity which reveal consensus structures shared by three species/sequences. Conclusion The present method gives an efficient tool complementary to sequence-alignment-based ncRNA finders.</p

Directory of Open Access Journals

Effects of using coding potential, sequence conservation and mRNA structure conservation for predicting pyrroly-sine containing genes

Author: B Chaudhuri
B Knudsen
C Notredame
Christian Theil Have
DG Longstaff
E Torarinsson
E Torarinsson
EP Nawrocki
GV Kryukov
Henning Christiansen
I Hofacker
IL Hofacker
IL Hofacker
IU Heinemann
J Atkins
J Reeder
JA Krzycki
JD Thompson
JD Thompson
K Katoh
M Bauer
M Fujita
M Höchsmann
MA Gaston
MA Gaston
N Wirth
S Bernhart
S Lindgreen
S Mørk
S Will
SE Seemann
SF Altschul
Sine Zambach
T Abe
TM Martin Simonsen
X Xu
Y Zhang
Z Yao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

BACKGROUND: Pyrrolysine (the 22nd amino acid) is in certain organisms and under certain circumstances encoded by the amber stop codon, UAG. The circumstances driving pyrrolysine translation are not well understood. The involvement of a predicted mRNA structure in the region downstream UAG has been suggested, but the structure does not seem to be present in all pyrrolysine incorporating genes. RESULTS: We propose a strategy to predict pyrrolysine encoding genes in genomes of archaea and bacteria. We cluster open reading frames interrupted by the amber codon based on sequence similarity. We rank these clusters according to several features that may influence pyrrolysine translation. The ranking effects of different features are assessed and we propose a weighted combination of these features which best explains the currently known pyrrolysine incorporating genes. We devote special attention to the effect of structural conservation and provide further substantiation to support that structural conservation may be influential – but is not a necessary factor. Finally, from the weighted ranking, we identify a number of potentially pyrrolysine incorporating genes. CONCLUSIONS: We propose a method for prediction of pyrrolysine incorporating genes in genomes of bacteria and archaea leading to insights about the factors driving pyrrolysine translation and identification of new gene candidates. The method predicts known conserved genes with high recall and predicts several other promising candidates for experimental verification. The method is implemented as a computational pipeline which is available on request

Roskilde Universitet

Multiple alignment and structure prediction of non-coding RNA sequences

Author: Anders Krogh
D Sankoff
DH Mathews
E Torarinsson
JH Havgaard
Paul P Gardner
S Will
Stinus Lindgreen
X Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

PETcofold: predicting conserved interactions and structures of two multiple alignments of RNA sequences

Author: Alkan
Altschul
Andreas S. Richter
Andronescu
Argaman
Bachellerie
Backofen
Bernhart
Bompfünewerer
Brunel
Busch
Byun
Chitsaz
Chitsaz
Dirks
Felsenstein
Gardner
Gardner
Gaspin
Geissmann
Gesell
Gorodkin
Gorodkin
Hertel
Hofacker
Horler
Huang
Huang
Hüttenhofer
Jan Gorodkin
Kato
Katoh
Knudsen
Knudsen
Kolbe
Lestrade
Li
Matthews
Menzel
Mercer
Mückstein
Mückstein
Pervouchine
Ravasi
Rehmsmeier
Richter
Rolf Backofen
Salari
Salari
Seemann
Seemann
Sharma
Stefan E. Seemann
Tafer
Taft
Tanja Gesell
The ENCODE Project Consortium
Torarinsson
Torarinsson
Tycowski
Udekwu
Večerek
Vinh
Vitali
Washietl
Washietl
Waterhouse
Waters
Watson
Weinberg
Will
Wilusz
Zuker
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Motivation: Predicting RNA–RNA interactions is essential for determining the function of putative non-coding RNAs. Existing methods for the prediction of interactions are all based on single sequences. Since comparative methods have already been useful in RNA structure determination, we assume that conserved RNA–RNA interactions also imply conserved function. Of these, we further assume that a non-negligible amount of the existing RNA–RNA interactions have also acquired compensating base changes throughout evolution. We implement a method, PETcofold, that can take covariance information in intra-molecular and inter-molecular base pairs into account to predict interactions and secondary structures of two multiple alignments of RNA sequences

CiteSeerX

Copenhagen University Research Information System

De Novo Discovery of Structured ncRNA Motifs in Genomic Sequences

Author: A Marchler-Bauer
A Stamatakis
AF Bompfünewerer
AF Bompfünewerer
BJ Parker
CB Do
D Sankoff
E Rivas
E Torarinsson
E Torarinsson
EE Regulski
ENCODE Project Consortium
EP Nawrocki
G Lunter
HH Tseng
IL Hofacker
IL Hofacker
J Gorodkin
J Gorodkin
JE Barrick
JH Havgaard
JS Mattick
JX Wang
K Missal
M Blanchette
MM Meyer
N Sudarsan
P Anandam
PP Gardner
R Durbin
S Griffiths-Jones
S Griffiths-Jones
S Washietl
S Washietl
S Will
SHF Bernhart
SR Eddy
T Babak
V Gowri-Shankar
WJ Kent
Y Ji
Y Sakakibara
Y Sun
Z Weinberg
Z Weinberg
Z Weinberg
Z Weinberg
Z Weinberg
Z Weinberg
Z Yao
Z Yao
ZJ Lu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Copenhagen University Research Information System

Genome-scale NCRNA homology search using a Hamming distance-based filtration strategy

Author: AF Bompfunewerer
Alex Liu
B Langmead
B Ma
D Sankoff
E Rivas
E Torarinsson
J Buhler
J Buhler
JH Havgaard
JH Havgaard
Jikai Lei
JS Pedersen
KC Pang
Osama Aljawad
P Gardner
P Schattner
PG Higgs
R Klein
R Li
S Chikkagoudar
S Griffiths-Jones
S Schwartz
S Washietl
SF Altschul
T Coenye
Y Sun
Y Sun
Yanni Sun
ZJ Lu
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Public Library of Science (PLOS)

Discovering cis-Regulatory RNAs in Shewanella Genomes by Support Vector Machines

An increasing number of cis-regulatory RNA elements have been found to regulate gene expression post-transcriptionally in various biological processes in bacterial systems. Effective computational tools for large-scale identification of novel regulatory RNAs are strongly desired to facilitate our exploration of gene regulation mechanisms and regulatory networks. We present a new computational program named RSSVM (RNA Sampler+Support Vector Machine), which employs Support Vector Machines (SVMs) for efficient identification of functional RNA motifs from random RNA secondary structures. RSSVM uses a set of distinctive features to represent the common RNA secondary structure and structural alignment predicted by RNA Sampler, a tool for accurate common RNA secondary structure prediction, and is trained with functional RNAs from a variety of bacterial RNA motif/gene families covering a wide range of sequence identities. When tested on a large number of known and random RNA motifs, RSSVM shows a significantly higher sensitivity than other leading RNA identification programs while maintaining the same false positive rate. RSSVM performs particularly well on sets with low sequence identities. The combination of RNA Sampler and RSSVM provides a new, fast, and efficient pipeline for large-scale discovery of regulatory RNA motifs. We applied RSSVM to multiple Shewanella genomes and identified putative regulatory RNA motifs in the 5′ untranslated regions (UTRs) in S. oneidensis, an important bacterial organism with extraordinary respiratory and metal reducing abilities and great potential for bioremediation and alternative energy generation. From 1002 sets of 5′-UTRs of orthologous operons, we identified 166 putative regulatory RNA motifs, including 17 of the 19 known RNA motifs from Rfam, an additional 21 RNA motifs that are supported by literature evidence, 72 RNA motifs overlapping predicted transcription terminators or attenuators, and other candidate regulatory RNA motifs. Our study provides a list of promising novel regulatory RNA motifs potentially involved in post-transcriptional gene regulation. Combined with the previous cis-regulatory DNA motif study in S. oneidensis, this genome-wide discovery of cis-regulatory RNA motifs may offer more comprehensive views of gene regulation at a different level in this organism. The RSSVM software, predictions, and analysis results on Shewanella genomes are available at http://ural.wustl.edu/resources.html#RSSVM

Directory of Open Access Journals

Digital Commons@Becker

RNAG: a new Gibbs sampler for predicting RNA secondary structure for unaligned sequences

Author: Bernhart
Bindewald
Carvalho
Cary
Charles E. Lawrence
Chenna
Ding
Ding
Do
Do
Do
Donglai Wei
Eddy
Gardner
Geman
Giegerich
Griffiths-Jones
Gutell
Hamada
Hamada
Hofacker
Hofacker
Ji
Kiryu
Kiryu
Knudsen
Lauren V. Alpert
Lindgreen
Liu
Mathews
Mathews
Meyer
Nawrocki
Nawrocki
Newberg
Sakakibara
Sankoff
Seemann
Siebert
Steffen
Tabaska
Torarinsson
Webb
Webb-Robertson
Will
Xing
Yao
Zuker
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Motivation: RNA secondary structure plays an important role in the function of many RNAs, and structural features are often key to their interaction with other cellular components. Thus, there has been considerable interest in the prediction of secondary structures for RNA families. In this article, we present a new global structural alignment algorithm, RNAG, to predict consensus secondary structures for unaligned sequences. It uses a blocked Gibbs sampling algorithm, which has a theoretical advantage in convergence time. This algorithm iteratively samples from the conditional probability distributions P(Structure | Alignment) and P(Alignment | Structure). Not surprisingly, there is considerable uncertainly in the high-dimensional space of this difficult problem, which has so far received limited attention in this field. We show how the samples drawn from this algorithm can be used to more fully characterize the posterior space and to assess the uncertainty of predictions

CiteSeerX

How accurately is ncRNA aligned within whole-genome multiple alignments?

Author: A Prakash
A Prakash
A Siepel
Adrienne X Wang
DA Pollard
DA Pollard
E Rivas
E Torarinsson
EH Margulies
G Bourque
J Pei
JD Thompson
JD Thompson
JD Thompson
L Wang
M Blanchette
M Brudno
M Cline
M Errami
Martin Tompa
MS Rosenberg
S Batzoglou
S Griffiths-Jones
S Griffiths-Jones
S Karlin
S Kumar
S Schwartz
S Washietl
SR Eddy
SR Eddy
T Lassmann
W Miller
Walter L Ruzzo
WJ Kent
WJ Kent
WJ Kent
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Multiple alignment of homologous DNA sequences is of great interest to biologists since it provides a window into evolutionary processes. At present, the accuracy of whole-genome multiple alignments, particularly in noncoding regions, has not been thoroughly evaluated. Results We evaluate the alignment accuracy of certain noncoding regions using noncoding RNA alignments from Rfam as a reference. We inspect the MULTIZ 17-vertebrate alignment from the UCSC Genome Browser for all the human sequences in the Rfam seed alignments. In particular, we find 638 instances of chimeric and partial alignments to human noncoding RNA elements, of which at least 225 can be improved by straightforward means. As a byproduct of our procedure, we predict many novel instances of known ncRNA families that are suggested by the alignment. Conclusion MULTIZ does a fairly accurate job of aligning these genomes in these difficult regions. However, our experiments indicate that better alignments exist in some regions.</p

Directory of Open Access Journals